# Robot Control

Pi0fast Base
Apache-2.0
π0+FAST is an efficient action tokenization scheme designed for robotics by Physical Intelligence, suitable for vision-language-action tasks.
Multimodal Fusion
P
lerobot
1,372
12
STEVE R1 7B SFT I1 GGUF
Apache-2.0
This is a weighted/matrix quantized version of the Fanbin/STEVE-R1-7B-SFT model, suitable for resource-constrained environments.
Text-to-Image English
S
mradermacher
394
0
Magma 8B
MIT
Magma is a foundational multimodal AI agent model capable of processing image and text inputs to generate text outputs, with complex interaction abilities in both virtual and real-world environments.
Image-to-Text Transformers
M
microsoft
4,526
363
Pi0
Apache-2.0
Pi0 is a general robot control model based on vision-language-action flow, supporting robot control tasks.
Multimodal Fusion
P
lerobot
11.84k
230
Minivla History2 Vq Libero90 Prismatic
MIT
MiniVLA is a compact yet high-performance vision-language-action model, compatible with Prismatic VLMs training scripts, suitable for robotics and multimodal tasks.
Image-to-Text Transformers English
M
Stanford-ILIAD
22
1
Vqbet Pusht
Apache-2.0
VQ-BeT is a behavior generation model trained for the PushT environment, designed based on latent action principles
Image Generation Transformers
V
lerobot
68
4
Openvla 7b
MIT
OpenVLA 7B is an open-source vision-language-action model trained on the Open X-Embodiment dataset, capable of generating robot actions based on language instructions and camera images.
Image-to-Text Transformers English
O
openvla
1.7M
108
Hpt Base
HPT is a transformer model that aligns different entities into a shared latent space, focusing on the study of expansion behaviors in policy learning.
Multimodal Alignment Transformers
H
liruiw
70
10
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase